Skip to content

Refactor apriori algorithm#14579

Open
JossGeek wants to merge 5 commits intoTheAlgorithms:masterfrom
JossGeek:refactor-apriori-algorithm
Open

Refactor apriori algorithm#14579
JossGeek wants to merge 5 commits intoTheAlgorithms:masterfrom
JossGeek:refactor-apriori-algorithm

Conversation

@JossGeek
Copy link
Copy Markdown

Describe your change:

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes Refactor the Apriori Algorithm #14577".

@algorithms-keeper algorithms-keeper Bot added awaiting reviews This PR is ready to be reviewed enhancement This PR modified some existing files labels Apr 24, 2026
Comment thread machine_learning/apriori_algorithm.py Outdated
# ---------- Helpers ----------


def get_support(itemset, transactions):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good extraction of get_support() as a standalone helper — makes the
logic much easier to test in isolation. However, it's missing type
hints and a docstring. Suggestion:

def get_support(itemset: frozenset, transactions: list[set]) -> int:
"""Return the number of transactions containing the itemset."""
return sum(1 for t in transactions if itemset.issubset(t))

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it !

for t in transactions:
for c in candidates:
if c.issubset(t):
candidate_counts[c] += 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using defaultdict(int) here is cleaner than the old counts = [0] *
len(itemset) approach — no more index tracking with enumerate().

One suggestion: the support counting loop (lines 93-96) could use
the new get_support() helper you defined earlier to avoid duplication
and keep the main apriori() function cleaner:

frequent = {
c: get_support(c, transactions)
for c in candidates
if get_support(c, transactions) >= min_support
}

Copy link
Copy Markdown
Author

@JossGeek JossGeek Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual logic

The actual implementation is optimal:

for t in transactions:
    for c in candidates:
        if c.issubset(t):
            candidate_counts[c] += 1

It passes over each transactions once, count all candidates at once, and avoid repeated scans, which make it algorithmically better.

Your suggestion

frequent = {
    c: get_support(c, transactions)
    for c in candidates
    if get_support(c, transactions) >= min_support
}

It calls the get_suppor() twice per candidate, which literally double the cost:

  • once in if
  • once in the value

So it improves readability, but definitely not the performance.

Aligned with your suggestion

candidate_counts = {}

for c in candidates:
    support = get_support(c, transactions)
    if support >= min_support:
        candidate_counts[c] = support

Here we computes support only once avoiding duplications, but we keep the logic readable. This way, we assure readability and performance.

JossGeek and others added 2 commits April 27, 2026 16:13
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed enhancement This PR modified some existing files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor the Apriori Algorithm

2 participants